GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Simple Linear Regression/[Python] Simple Linear Regression.ipynb
¹³³⁵ views

Kernel: Python 3

Simple Linear Regression

In [10]:

from IPython.display import Image

In [13]:

Image("img/01.png")

Out[13]:

b0 is constant representing the base salary of anyone who come to profession and have no experience i.e. Experience = 0
b1 is coefficient representing the slope. The more experience the more raise will be their in salary.

Here in the graph, the black line is Best Fitting Line

In [17]:

Image("img/02.png")

Out[17]:

Actual value vs Model value and Ordinary Least Square

In [18]:

Image("img/03.png")

Out[18]:

Data Preprocessing

In [55]:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
%matplotlib inline  

# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""

In [56]:

X_train

Out[56]:

array([[  4. ],
       [  1.1],
       [  2.2],
       [  5.1],
       [  2.9],
       [  4.1],
       [  4. ],
       [  7.9],
       [  1.3],
       [  1.5],
       [  9. ],
       [  2. ],
       [  7.1],
       [  9.5],
       [  5.9],
       [ 10.5],
       [  6.8],
       [  3.2],
       [  3.9],
       [  4.5],
       [  6. ],
       [  3. ]])

In [57]:

X_test

Out[57]:

array([[  9.6],
       [  4.9],
       [  8.2],
       [  5.3],
       [  3.2],
       [  3.7],
       [ 10.3],
       [  8.7]])

In [58]:

y_train

Out[58]:

array([  56957.,   39343.,   39891.,   66029.,   56642.,   57081.,
         55794.,  101302.,   46205.,   37731.,  105582.,   43525.,
         98273.,  116969.,   81363.,  121872.,   91738.,   54445.,
         63218.,   61111.,   93940.,   60150.])

In [59]:

y_test

Out[59]:

array([ 112635.,   67938.,  113812.,   83088.,   64445.,   57189.,
        122391.,  109431.])

Fitting Simple Linear Regression to the Training Set

In [60]:

regressor = LinearRegression() # Object of LinearRegression class

In [61]:

regressor.fit(X_train, y_train) # Method for make the machine learn the correlation

Out[61]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Predicting the Test set result

In [62]:

y_pred = regressor.predict(X_test)

In [63]:

y_pred

Out[63]:

array([ 115439.88180109,   71396.10622651,  102320.45928951,
         75144.51265839,   55465.37889103,   60150.88693088,
        121999.59305688,  107005.96732936])

Visualising the Training set results

X = Years of Experience
Y = Salary

In [68]:

plt.scatter(X_train, y_train, c = 'red')
plt.plot(X_train, regressor.predict(X_train), c = 'green')
plt.title('Salary vs Experience (Training Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

Out[68]:

Text(0,0.5,'Salary')

Visualising the Test set results

In [70]:

plt.scatter(X_test, y_test, c = 'red')

# We don't need to change X_train by X_test in plot as our regressor is unique
plt.plot(X_train, regressor.predict(X_train), c = 'green')
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

Out[70]:

Text(0,0.5,'Salary')

Simple Linear Regression

Actual value vs Model value and Ordinary Least Square

Data Preprocessing

Fitting Simple Linear Regression to the Training Set

Predicting the Test set result

Visualising the Training set results

Visualising the Test set results

Product

Resources

Company